Prediction of O-Glycosylation Sites in Proteins using PSO-Based Data Balancing and Random Forest
نویسندگان
چکیده
O-glycosylation of mammalian proteins is one of the most important post-translational modifications (PTMs). Hence, there is significant interest in the development of computational methods for reliable prediction of O-Glycosylation sites from amino acid sequences. One particular challenge in training the classifiers comes from the fact that the available dataset is highly imbalanced, which makes the classification performance for the minority class becomes unsatisfactory. Traditional sampling approaches generally rely on random re-sampling from a given dataset. However, these methods cannot utilize all the information available in the training set and it increases the false positive rate. This paper proposes a new approach for predicting the O-glycosylation sites which is based on Particle Swarm optimization (PSO) and Random Forest (RF). PSO is used as evolutionary under-sampling technique for balancing the dataset, and Random Forest is used as a classifier. The results obtained from the proposed approach and other related researches, demonstrate that the proposed approach outperforms the performance of other approaches for the experimented dataset. [Hebatallah A. Hassan, M. B. Abdelhalim, Amr Badr. Prediction of O-Glycosylation Sites in Proteins using PSOBased Data Balancing and Random Forest. Life Sci J 2014;11(12):1019-1025]. (ISSN:1097-8135). http://www.lifesciencesite.com. 175
منابع مشابه
Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique
O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification ac...
متن کاملChurn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies
The telecommunication industry faces fierce competition to retain customers, and therefore requires an efficient churn prediction model to monitor the customer’s churn. Enormous size, high dimensionality and imbalanced nature of telecommunication datasets are main hurdles in attaining the desired performance for churn prediction. In this study, we investigate the significance of a Particle Swar...
متن کاملGlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome
MOTIVATION Glycosylation is a ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles in various biological processes (BPs) such as cellular communication, ligand recognition and subcellular recognition. It is estimated that >50% of the entire human proteome is glycosylated. However, it is still a significant challenge to identify glycosylat...
متن کاملAccuracy Improvement of Mood Disorders Prediction using a Combination of Data Mining and Meta-Heuristic Algorithms
Introduction: Since the delay or mistake in the diagnosis of mood disorders due to the similarity of their symptoms hinders effective treatment, this study aimed to accurately diagnose mood disorders including psychosis, autism, personality disorder, bipolar, depression, and schizophrenia, through modeling and analyzing patients' data. Method: Data collected in this applied developmental resear...
متن کاملAccuracy Improvement of Mood Disorders Prediction using a Combination of Data Mining and Meta-Heuristic Algorithms
Introduction: Since the delay or mistake in the diagnosis of mood disorders due to the similarity of their symptoms hinders effective treatment, this study aimed to accurately diagnose mood disorders including psychosis, autism, personality disorder, bipolar, depression, and schizophrenia, through modeling and analyzing patients' data. Method: Data collected in this applied developmental resear...
متن کامل